21 research outputs found

    The Pulse of News in Social Media: Forecasting Popularity

    Full text link
    News articles are extremely time sensitive by nature. There is also intense competition among news items to propagate as widely as possible. Hence, the task of predicting the popularity of news items on the social web is both interesting and challenging. Prior research has dealt with predicting eventual online popularity based on early popularity. It is most desirable, however, to predict the popularity of items prior to their release, fostering the possibility of appropriate decision making to modify an article and the manner of its publication. In this paper, we construct a multi-dimensional feature space derived from properties of an article and evaluate the efficacy of these features to serve as predictors of online popularity. We examine both regression and classification algorithms and demonstrate that despite randomness in human behavior, it is possible to predict ranges of popularity on twitter with an overall 84% accuracy. Our study also serves to illustrate the differences between traditionally prominent sources and those immensely popular on the social web

    Blind Men and the Elephant: Detecting Evolving Groups In Social News

    Full text link
    We propose an automated and unsupervised methodology for a novel summarization of group behavior based on content preference. We show that graph theoretical community evolution (based on similarity of user preference for content) is effective in indexing these dynamics. Combined with text analysis that targets automatically-identified representative content for each community, our method produces a novel multi-layered representation of evolving group behavior. We demonstrate this methodology in the context of political discourse on a social news site with data that spans more than four years and find coexisting political leanings over extended periods and a disruptive external event that lead to a significant reorganization of existing patterns. Finally, where there exists no ground truth, we propose a new evaluation approach by using entropy measures as evidence of coherence along the evolution path of these groups. This methodology is valuable to designers and managers of online forums in need of granular analytics of user activity, as well as to researchers in social and political sciences who wish to extend their inquiries to large-scale data available on the web.Comment: 10 pages, icwsm201

    Predicting Rising Follower Counts on Twitter Using Profile Information

    Full text link
    When evaluating the cause of one's popularity on Twitter, one thing is considered to be the main driver: Many tweets. There is debate about the kind of tweet one should publish, but little beyond tweets. Of particular interest is the information provided by each Twitter user's profile page. One of the features are the given names on those profiles. Studies on psychology and economics identified correlations of the first name to, e.g., one's school marks or chances of getting a job interview in the US. Therefore, we are interested in the influence of those profile information on the follower count. We addressed this question by analyzing the profiles of about 6 Million Twitter users. All profiles are separated into three groups: Users that have a first name, English words, or neither of both in their name field. The assumption is that names and words influence the discoverability of a user and subsequently his/her follower count. We propose a classifier that labels users who will increase their follower count within a month by applying different models based on the user's group. The classifiers are evaluated with the area under the receiver operator curve score and achieves a score above 0.800.Comment: 10 pages, 3 figures, 8 tables, WebSci '17, June 25--28, 2017, Troy, NY, US

    An Automated Pipeline for Character and Relationship Extraction from Readers' Literary Book Reviews on Goodreads.com

    Full text link
    Reader reviews of literary fiction on social media, especially those in persistent, dedicated forums, create and are in turn driven by underlying narrative frameworks. In their comments about a novel, readers generally include only a subset of characters and their relationships, thus offering a limited perspective on that work. Yet in aggregate, these reviews capture an underlying narrative framework comprised of different actants (people, places, things), their roles, and interactions that we label the "consensus narrative framework". We represent this framework in the form of an actant-relationship story graph. Extracting this graph is a challenging computational problem, which we pose as a latent graphical model estimation problem. Posts and reviews are viewed as samples of sub graphs/networks of the hidden narrative framework. Inspired by the qualitative narrative theory of Greimas, we formulate a graphical generative Machine Learning (ML) model where nodes represent actants, and multi-edges and self-loops among nodes capture context-specific relationships. We develop a pipeline of interlocking automated methods to extract key actants and their relationships, and apply it to thousands of reviews and comments posted on Goodreads.com. We manually derive the ground truth narrative framework from SparkNotes, and then use word embedding tools to compare relationships in ground truth networks with our extracted networks. We find that our automated methodology generates highly accurate consensus narrative frameworks: for our four target novels, with approximately 2900 reviews per novel, we report average coverage/recall of important relationships of > 80% and an average edge detection rate of >89\%. These extracted narrative frameworks can generate insight into how people (or classes of people) read and how they recount what they have read to others

    Gestalt Computing and the Study of Content-oriented User Behavior on the Web

    No full text
    Elementary actions online establish an individual's existence on the web and her/his orientation toward different issues. In this sense, actions truly define a user in spaces like online forums and communities and the aggregate of elementary actions shape the atmosphere of these online spaces. This observation, coupled with the unprecedented scale and detail of data on user actions on the web, compels us to utilize them in understanding collective human behavior. Despite large investments by industry to capture this data and the expanding body of research on big data<\italic> in academia, gaining insight into collective user behavior online has been elusive. If one is indeed able to overcome the considerable computational challenges posed by both the scale and the inevitable noisiness of the associated data sets, one could provide new automated frameworks to extract insights into evolving behavior at different scales, and to form an altogether different perspective of aggregated elementary user actions. This thesis addresses this fundamental and pressing problem and offers a gestalt computing<\italic> approach when studying complex social phenomena in large datasets. This approach involves extracting macro structures from aggregated user actions, finding their possible meanings, and arranging data in layers so that it is iteratively explorable. The dissertation includes three major sections; first modeling and prediction of diffusion of information by users on the social web; next, detection of topics promoted by user communities; finally, presentation of the gestalt computing framework through a methodology that uses graph theory, language processing, and information theory to provide a top-down map of group dynamics on social news websites. What we find is not only statistical significance in the extracted structure, but also that the results are meaningful to human understanding. The efficacy of the proposed methodologies is established via multiple real-world data sets
    corecore